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Abstract. Most proof systems for concurrent programs [Jon83] [O'H07] 
H.I [VP07] assume the underlying memory model to be sequentially consis- 

^»H ' tent(SC), an assumption which does not hold for modern multicore pro- 

cessors. These processors, for performance reasons, implement relaxed 
{^J ■ memory models. As a result of this relaxation a program, proved correct 

on the SC memory model, might execute incorrectly. To ensure its cor- 
rectness under relaxation, fence instructions are inserted in the code. 
In this paper we show that the SC proof of correctness of an algorithm, 
carried out in the proof system of [Sou84] , identifies per-thread instruc- 
tion orderings sufficient for this SC proof. Further, to correctly execute 
rX3 ' this algorithm on an underlying relaxed memory model it is sufficient to 

O | respect only these orderings by inserting fence instructions. 



>■ . 1 Introduction 



The memory model of a processor defines the order in which memory opera- 



£SJ , tions, issued by a single processor, appear to execute from the point of view of 



the memory subsystem. In a broad sense, it determines whether any two memory 
f-*j \ access instructions issued by a processor within a single thread can be reordered. 

£f) • Sequentially Consistent memory model (SC) is the simplest but most restrictive 

of all and does not allow any reordering of instructions within a thread. Mod- 
ern multicore processors, for the purpose of hiding latencies, implement relaxed 
memory models and allow instructions within a thread to be reordered as long 
r> . as they operate on different memory addresses. For example, Total Store Order 

j^ \ (TSO), the memory model for x86 processors, allows write instructions to get 

reordered with later reads provided they operate on different memory locations. 
As a result of these relaxations, a program may exhibit more behaviours than 
under SC and it is possible that some of these extra behaviours do not satisfy the 
property which holds under SC. Peterson's mutual exclusion algorithm, in Figure 
1 illustrates this behaviour. This algorithm satisfies mutual exclusion property 
under the SC model but executing it on an Intel's x86 processor might result in 
the violation of this property. This can happen if the read of fiag2 at label 3 is 
reordered before instructions at label 1 and label 2. With such reordering Proci 
can enter the critical section. Proc2, with or without this reordering, can also 
enter the critical section simultaneously. This example clearly shows the effect 
of memory model on the correctness of an algorithm. 



Proci 

l.flag 1 :=true 

2.turn:=2 

3.while(flag 2 && turn = 2) do od 



Critical Section 



Proc 2 



6.nag 2 :=true 

7.turn:=l 

8.while(flag 1 



turn = 1) do od 



Critical Section 



=0 



4.flag 1 — 9.flag 2 : 

Fig. 1. Peterson's mutual exclusion algorithm 



It is clear that this problem appeared because of an extra execution gener- 
ated due to instructions' reordering which was not possible under the SC memory 
model. This problem also does not appear for a data race free program if the 
underlying relaxed memory model satisfies data-race freedom (DRF) property. 
It can be shown that any data race free program when executed on a memory 
model satisfying DRF exhibits exactly the same set of behaviours as under the 
SC memory model. A program is data race free if in every execution of this 
program any pair of an conflicting instructions (w-w or w-r) by two different 
threads to the same variable are separated by an unlock instruction. There- 
fore a data race free and a correct program under SC is guaranteed to execute 
correctly on a memory model which satisfies DRF property. However, the al- 
gorithms that we are interested in (lock-free, wait-free, lock implementations) 
do not use lock/unlock and hence do not fit under this definition of data race 
freedom. Therefore, their correctness under a relaxed memory model does not 
follow from their correctness under the SC memory model. 

Another way to avoid extra executions is to prevent certain reordcrings by 
putting a special instruction, fence, after every instruction in each thread. A 
fence instruction when placed between any two instructions in a thread pro- 
hibits their reordering. Execution of a fully fenced program (fence after every 
instruction in a thread) on a relaxed memory model generates exactly the same 
set of executions as under SC and therefore the correctness under SC implies 
the correctness under relaxed memory model. However, this trivial placement 
strategy would negate the performance benefits associated with relaxed mem- 
ory models. Therefore, an ideal placement of fence instructions should preserve 
only those program orders which are sufficient to prove the correctness of the 
properties of interest. 

In this paper, we deal with parallel programs which satisfy some property 
under SC but are not race-free. The main contribution of this paper is to show 
that the proof of correctness of these programs under SC is useful in identifying 
per-thread instruction orderings sufficient to make this program correct on a 
relaxed memory model. Further, locations of fence instructions can be inferred 
based on these orderings and the underlying memory model. We are not aware 
of any other attempt to use the SC proof of correctness for fence inference in a 
concurrent program. 



2 Related Work 

All existing approaches for inferring fences for relaxed memory models can 
be divided into two main categories; model checking based approaches [HR07] 
[KVY11] [LW11] [AAC + 12b] and proof system based approaches [RidlO] [Borl2] 
[BD12]. Model checking based approaches first explore the state space of a pro- 
gram under a given memory model using buffer based operational semantics 
and check the reachability of erroneous states. Once a reachable erroneous state 
is identified, the path leading to this state is restricted by inserting fences at 
appropriate places. [ABBM12] and [ABBM10] showed that the state reacha- 
bility problem for TSO and PSO memory models is decidable for finite state 
programs. Further, this problem becomes undecidable as soon as the read after 
write reordering is added to the memory model. This approach, by its nature, 
is better suited to programs with finite data domains. We are aware of only 
one line of work [AAC + 12a] which combines predicate abstraction and model 
checking based approach to verify and correct infinite data domain programs 
like Lamport's bakery algorithm. 

The second approach is to use a memory model specific proof system as done 
in [RidlO]. [RidlO] presents a separation logic based proof system for the TSO 
memory model and shows that Simpson's 4 slot algorithm does not satisfy the 
interference freedom property. Recently [Borl2] and [BD12] looked at the use of 
separation logic derived proof system for verifying concurrent data structures on 
POWER/ ARM based memory models. These memory models are more complex 
than the TSO or the PSO memory model mainly because of non-atomic writes. 
Unlike the approaches of [RidlO], [Borl2] and [BD12], we do not propose a 
memory model specific proof system but only look at the proof of correctness 
under SC memory model and use it to infer sufficient orderings required for the 
correctness. Unlike model checking approaches, proof system based approaches 
can cover more than just reachability. This is evident in the example of Simpson's 
4 slot algorithm where apart from the interference freedom we also prove that 
the sequence of values observed by the reader are consistent, i.e. they form a 
stuttering sequence of the values written by the writer. We are not aware of 
any line of work which handled the fence inference in Simpson's 4 slot algorithm 
under PSO memory model with respect to the interference freedom and the 
consistent reads properties. 

3 Language: Syntax and Semantics 

Figure 2 shows the syntax of a simple parallel programming language without 
the support of dynamic thread creation. Operator || is used to compose a finite 
number of programs in parallel. We explicitly distinguish local variables, ranged 
over by ^var and accessed only within a thread, and shared variables, ranged over 
by shvar and accessed by more than one thread. A local expression, ranged over 
by £ exp, is constructed using only local variables, values and operators. A shared 
expression, ranged over by shexp, is constructed from exactly one shared vari- 
able, another local expression and operators. Assignment command only allows 



Program = Cmd;end 

Cmd = shvar:=foxp | £var:=fo<p | A/ar:=shexp | Cmdi;Cmd2 | Program^ || ■ ■ ■ ||Program r 

if ^exp then Cmdi else Cmd2 | while !?exp do Cmd od | skip | fence 
shexp = shvar | shvar © lexp 
£exp = £var | tecp © lexp | val 
val = N | B | T 

© = + | - | X | -T- 

Fig. 2. A simple concurrent programming language with barrier support 



assigning a local expression to a local variable, a shared expression to a local 
variable or a local expression to a shared variable. Assignment of a shared ex- 
pression to a shared variable can be broken down into assignments of one of the 
above forms. Because of this restriction every assignment command either reads 
at most one shared variable or writes to at most one shared variable but not 
both. This guarantees at most one memory load or store event per assignment 
expression which is helpful in reasoning about memory model and associated 
events. 

3.1 SC Semantics 

Figure 3 shows the semantics of this language under SC. A state or configuration 
under SC is of the form (Q, Tstore) where 

s-> def . . _,_ def __..-_ n def . . 

y = shvar — > val, Tstore = Tid ^Lx Program, L = tvar — > val 

Local store £ and global store Q maps local and shared variables respectively 
to their values. Each thread is represented by a unique thread id. Thread store 
maps a thread id to its local store and the program to be executed next. In our 
operational semantics we use the set representation of this function, i.e Tstore 
as a set of tuples of the form (t, C, C). Function [— \g t c € exp — > Q — > C — > val 
such that 

[shvar] e , £ = e(shvar), [A/ar] g , £ = £(£var), { ei © e 2 hx = Max © [ei] S ,£ 

takes an expression and evaluates it to a value based on the mapping of variables 
in global store Q and thread local store C. This function is then used to define 
reference semantics in Figure 3. For any function T : A — > B, a' e A, b' 6 B, we 
write J- [a' := b'] to denote a function which is same as J- everywhere except at a' 
where it evaluates to b' . Semantic rules corresponding to the conditional and the 
looping constructs (ITE-T, ITE-F, WHL-T, WHL-F), local variable's read write 
(LRW) and global variable's read write (GR, GW) are quite straightforward. 
In the parallel composition command, the parent thread stops its execution and 



C — shvar— fexp; C' 
{lexpjgx —V Q' — <7[shvar := v] 

(g, {(t, c, c)} ur)^ (<?', {(t, c, c')} u rj 

(GW) 
C = £var:=&xp; C" 
{lexpjgx = v C = £[lvar := v] 
(G, {(t, £, C)} U T) -»■ (0, {(t, A, C')} U T) 

(LRW) 
C = join(A/ar);C" 
[^varl q X = i' G Tid t' & dom(T) U {t} 

(5, {(t, A C)} U T) -> (S, {(*, £, C')} U T) 

(Join) 
[fexpjg, £ = true 
C = if £exp then Cmdi else Cmd2; C 

(g,{(U,c)}ur)^(g,{((,£,Cmd i; c')}uT) 

(ITE-T) 
[£exp]g : £ = true 
C = while £exp do Cmd od; C" 

(5, {(t, £,C)}UT)^ (0, {(*, A Cmd; C)}) 

(WHL-T) 



C = A/ar:=shexp; C' 

[shexp|g,£ = v A — £.[£var :— v] 

(G, {(*, £, C)} U T) -»■ (5, {(*, A, C")} U T) 

(GR) 



C* = end 



(END) 



(g,{(t,£,c)}uT)^(e,r) 

Ce {skip; C", fence; C"} 

(5, {(*, A C)} UT)^ (£?, {(*, A C')} U T) 

(SKP-SYC) 
[lexp]g,£ = false 
C = if ^exp then Cmdi else Cmd2; C" 

(5, {(t, A C7)} U T) ->• (5, {(i, A Cmd 2 ; C")} U T) 

(ITE-F) 
[&xp]g, £ = false 
C = while &xp do Cmd od; C' 

(g,{(t,£,C)}UT) ->• (0,{(t,£,C')}) 

(WHL-F) 



r = r U {{(ti, 0, Program^}, ■ ■ ■ , {(*„, 0, Program J}} 

£i,--- ,£„ 0dom(T)u{£} 

C = Program^] ■ ■ ■ ||Program n ; C' 



(g, {(*, A C)} U T) -> (5, {(t, A join(ti) 



;join(i„);C')}ur') 



- (PARCOMP) 



Fig. 3. Reference semantics (SC) for programming language of Figure 2 



waits for all children threads to finish their execution. This is achieved by adding 
a join() command in the parent thread for each spawned thread. The rule for 
join(tid) command ensures that this command is executed when the thread 
corresponding to tid has finished its execution. This also ensures that the parent 
thread waits for the completion of children threads before continuing further. 
fence and skip are like no-op under SC semantics. Following the syntax of 
Figure 2 one process ProQ, or thread, can only execute one program Program^. 
Therefore we sometime use ProQ and Program, interchangeably to mean the 
same thing. 



4 Logic 

In the proof system of [Sou84] every process ProQ executing a program Program, 
has a history variable hp roCi which captures the interaction of this process with 
shared variables in terms of values read from and written to them. hp roCi is 
a sequence of elements of the form (?shvar, ph) or (Ishvar, v). (?shvar, ph) is 



added to the sequence when ProQ reads a value from the shared variable shvar. 
Similarly, (!shvar, v) is added to the sequence when ProQ writes a value v to the 
shared variable shvar. Local reasoning of a program Program, generates a triple 
of the form {Pi} Program.^ {Qi} where Pi and Qi define assertions on the local 
state of the process as well as on the history hp roc . . Rest of this section describes 
the axioms of this proof system explaining the idea of local reasoning in terms of 
history variable and the parallel composition rule in terms of Com pat predicate. 

Axioms In our programming language all program constructs, except GR and 
GW, operate on local expression and therefore do not require reading from or 
writing to shared variables. Therefore the proof rules for these constructs are 
same as the Hoare's axioms in sequential setting. In the following proof rules, 
the notation P[Q/Q'][R/R'] denote simultaneous substitution of Q' for Q and 
R' for R in P. Operator "." concatenates an element to a sequence and e is 
the empty sequence. Given a sequence er, \a\ is the length and o~[i] is the i th 
element of this sequence. The proof rule for the assignment to shared variable is 
as following, 

{P[hp r0Cl /h Pr0C] .(!shvar,^exp)]} shvar:=to<p {P} (GWrite) 

As a result of this write the history is appended with the element (Ishvar, £exp). 
The proof rule for the reading of a shared variable, £var:=shexpj, is given as 

{Vph. _P[hp roCi /hp roCi .(?shvarj,ph)][£var/shexp J [shvar_,7ph]]} £var:=shexpj {P} 

(GRead) 
This rule requires the value of the shared variable shvar^- in order to evaluate 
the expression shexp but while reasoning locally we do not know the value 
beforehand. Therefore instead of the actual value of shvarj a placeholder variable 
ph is assigned to shvarj and this information is stored in hp roc . by appending it 
with (?shvarj, ph). Further, shexp is evaluated accordingly before being assigned 
to £var in the assertion. Here ph is universally quantified to all possible values. 



{Pi A h Proc . = e} Program^ {Qi}, i = l---n 

{Pi A • • • A P n A shvari = Vi A ■ • • shvar m = v m } 
Program^ ■ • • (Program,, 
{Qi A ■■• A<2„ ACompat(>i,--- ,v m , h Pr0Cl , • • • ,h Proc J} 



(ParComp) 



In parallel composition, each process is analyzed in isolation with the initial 
value of its history variable hp roCi set to empty and some precondition Pi on 
its local state. This gives post-condition Qi for each process ProQ which also 
contains the assertions on hp roCi . Precondition of the parallel composition rule 
is the conjunctions of individual processes' preconditions and the initial values 
of shared variables shvari to shvar m . Post-condition of this rule is the conjunc- 
tions of individual processes' post-conditions along with the predicate Com pat 
where Compat is defined in Figure 4. Mer <?e(hp rocl , ■ ■ • , hp roc J represent the set 
of all possible inter leavings of histories hp rocl to hp roCn such that the sequence 



Compat(«i, • ■ • ,v m ,h Pr0Cl , ■ ■ ■ ,h ProCn ) = 

shvari — v% A ■ ■ ■ shvar m = v m if h ProC| = e, i = 1 ■ 

3h 6 A/erge(hp roci , ■ ■ ■ , hp roCn ). otherwise 

Vj < m. shvar., = fj(Vj, h) 

AVfc < |ft|. h[k] = (?shvar j ,ph j ) =* 
ph^f^v^hll-.k-l]), 

Fig. 4. Non-recursive definition of Compat predicate 



of elements within them is preserved in the merged history. Function fj(v,h) 
returns the last value written to the variable shvar., in history h. It returns v 
if no such write is found. Essentially, the predicate Compat generates a set of 
equality predicates (one corresponding to each merged history) . The first line in 
Com pat's definition denotes that the final value of any shared variable shvar.,' is 
the last value written to shvar,, in that merged history. Second line relates the 
placeholder value ph.-, corresponding to a read of a shared variable shvar.,, to 
the latest value written to shvar., just before this element in the merged history. 
This is sufficient to characterize all compatible merged histories and therefore 
plays the central role in proofs of §6 and of Appendix ??. 

Individual process histories contains placeholder variables for every read. In 
Figure 5 we define, rather informally, a set of predicates over these histories 
in order to succinctly represent them in our proofs. Given a sequence a, a J type 
denote the restricted subsequence of a consisting of only type elements. Predicate 
None! holds if the history does not contain any write to any shared variable. 
Nonelshvar and None?shvar hold if the history does not contain any write to 
or read from the variable shvar. [P]* and [P} + capture the regularity of history 
sequence by abstracting the placeholder variables. We also admit P + => P* 
as a relaxation on the history sequence which is used in the consequence rule. 
We admit that the use of these predicates, without giving proper semantics, 
is not fully justified but we do it solely for the purpose of making our proofs 
manageable. We leave more formal treatment of these predicates for the future 
work. 



5 Relaxed Memory Model 

For the rest of this paper we consider the PSO memory model which allows 
reordering of a write instruction with future reads and future write instructions 
operating on different variables. To simulate the effect of PSO, every thread 
is equipped with one buffer per shared variable. These buffers store the values 
written on the corresponding variable by this thread in a qucuc(FIFO) discipline. 
Buffering the value of the write in a variable specific queue, simulates the effect of 
delaying the execution of a write instruction past future reads and future writes 
on different memory locations. If a read instruction for any shared variable shvar 
is executed in a thread then the local buffer of shvar is checked first. If this buffer 



def def 

None?shvar = Xelem, h. h J (? s hvar._) — e None! = Xelem, h. h Jn )= e 

Nonelshvar = Xelem, h. h J(i s h V ar ,_)— e [-Pj Q] = Xh3ho,hi. h = ho-hi A P{ho) A Q(h{) 



[P]* d =Xh. h = eV3ho,hi. h = ho.hi AP(h ) A[P]*(hi) 
[P] + d = f Xh. 3h ,hi. ho^eAh = h .hiAP(ho)A[P]*(hi) 



Fig. 5. Predicates on individual history variable hi 

is non empty then the latest written value (from the tail) is returned. In case of 
empty buffer the value is read from the global state. This memory model also 
provides an explicit fence instruction in order to restrict reordering of any two 
instructions within a thread. Operationally this is achieved by flushing all the 
buffers of that thread. 

For relaxed semantics we need some modifications in the notion of State. A 
state or configuration under this memory model is of the form (Q, Tstore) where, 

def def 

Q = shvar — ► val Tstore = Tid -i£xSx Program 

12 de f u . r de f D > I 

d = shvar — > <r L — £var — > val 

The only change with respect to SC state is in the definition of thread store. 
Here, the range of this function also contains a buffer store B which is a function 
from shared variable to an ordered sequence of values, ranged over by a. Function 
[— }g m c B G exp — > Q — > C —¥ B — > val, defined in Figure 7, takes an expression 
and evaluates it based on the values stored in the global store, thread local 
store and buffer store, Further, relaxed semantics is defined in Figure 6. For 
the constructs not shown in Figure 6 the semantics is the same as in the SC 
semantics of Figure 3 except for the change in the evaluation function from 
Hex t° H™£8- I n relaxed semantics, a thread t executing a write to a 
shared variable shvar enqueues the value in the buffer of shvar in t. Any read of a 
shared variable shvar returns the latest value in the buffer of shvar, if any. If this 
buffer is empty then the read returns the value from the global state (memory). 
Flush operation non-deterministically deques an element from any thread buffer 
and updates the global state accordingly. Further, in the parallel composition 
rule the requirement for the parent thread's buffer being empty ensures that 
instructions before and after the parallel composition are ordered. Same holds 
for the end command as well. 

6 Examples 

Wc use our proof system to prove the correctness of Lamport's bakery algorithm 
[Lam74] for two processes and Simpsons's 4 slot algorithm [Sim90]. 



C = shvar:=fo<p; C" B(shvar) = a 
l£expYg m c ,B = V B' = B[shvar — a.v] 



(g, {(*, £, b,c)}ut)4 (e, {(i, £, b', c")} u r) 

(GW) 
C = &/ar:=tecp; C" 
[lexp]g% |B =t) £' = £[A/ar := v] 

(g, {(t, c, b,c)}ut)^ (g, {(t, c, b, c')} u r) 

(LRW) 
C=join(lvar);C" 

¥^Yg m c,B = t' € Tid t' £ dom(T) U {£} 

(5, {(*, £, B,C)}UT)^i (0, {(*, £, B, C")} U T) 

(Join) 



C = A/ar:=shexp; C" 
[shexp]^, B = v £' = £[A/ar := u] 

(g, {(t, C, B, C)} U T) 4 (g, {(*, C, B, C')} U T) 

(GR) 
C = end Vshvar G dom(B). B(shvar) = e 

(e,{((,£,6,C)}UT)^(6,T) 

(END) 
3shvar G dom(S). B(shvar) = v. a 
B' = B[shvar — a] Q' = 5[shvar := v] 

(g, {(t, £, B, C)} DT)^ {Q 1 , {(t, £, B', C)} U T) 

(Flush) 
C = fence; C 
Vshvar £ dom(B). B(shvar) = e 

(0, {(t, A B, C)} UT)^i (0, {(*, £, B, C')} U T) 

(Fence) 



T' = T U {{(*!, 0, Program^}, ■ ■ • , {(£„, 0, Program„)}} 

*!,•••,*„ £dom(T)U{£} 

C = ProgramJ ■ ■ ■ ||Program n ; C" Vshvar G dom(B). B(shvar) = e 

(5, {(*, £, B, C)} U T) 4 (6, {(*, £, B, join(ti); • • • ; join(i n ); C')} U T') 

Fig. 6. Relaxed semantics for programming language of Figure 2 



(PARCOMP) 



[shvar] gi£ , B 



if B(shvar) = a.v 
(J(shvar) otherwise 



«var]^, e = £(A/ar) [ex e 2 ]^ c , 8 = [ei]£J;, B [eij^s 

Fig. 7. [— Jg"c is for relaxed semantics 



— Lamport's algorithm has unbounded data domain for token variables which 
makes its verification challenging for model checking based approaches. We 
prove that this algorithm satisfies the mututal exclusion property, i.e. it is 
never possible for both processes to be inside their critical section simulta- 
neously. 

— Simpson's 4 slot algorithm implements a wait-free and lock-free atomic reg- 
ister for concurrent reader and writer. This algorithm uses disjoint slots to 
read from and write to in presence of interference. We prove that this algo- 
rithm is safe in the sense that concurrent reader and writer never use the 
same slot in presence of interference. Further, we also prove that the values 
observed by successive reads are in the same order as written by the writer. 

One important notation used in our proofs is as following; Let P and Q be the 
assertions about individual elements in a history sequence then P < Q denotes 



the fact that the element satisfying the assertion P appears before the element 
satisfying the assertion Q in the history sequence. 



6.1 Example: Lamport's Bakery Algorithm for Two Processes 



Invl d = Ah. 

/'Let phi , pfi2 , phs , phi , 

Pi,Qi,Ri,T 1: Ui,Vi,Wi. in 
Pi = Xh. (!tokeni,0)O) in 
Qi = Xh. (Itakingi, true) (ft) in 
Pi — Xh. (?token2,phi)(/i) in 
Ti = Xh. (!tokeni,phi + l)(h) in 
U\ = Xh. (Itakingi, false) (h) in 
Vi = Xh. (?taking2, ph 2 )(h) in 
W\ = Xh. (?tokeni , phs) (/i) in 
X\ — Xh. (?token2,ph4)(/i) in 



[[Qi;7?i;T i; i7i;None!;Pi]*;Qi;7?i;Ti 
Ui ; None! ; Vi ; None! ; Wi ; X ± ] (h) 



Inv2 = Ah. 
\_ fLet pti x ,pti 2 ,ph' z ,pti A , 

Pi,Q2,R2,T 2 ,U 2 ,V 2 ,W2. in 
P 2 = Xh. (!token 2 ,0)(/i) in 
Q 2 = Xh. (!taking2, true)(/i) in 
P2 = Xh. (?tokeni,phi)(/i) in 
T 2 = Xh. (!token 2 ,ph'i + l)(h) in 
U2 = Xh. (!taking2, false) (/i) in 
V2 = Xh. (?takingi, ph 2 )(h) in 
W 2 = Xh. (?tokem,pr4)(/i) in 
X2 — Xh. (?token2,phi)(/i) in 
[[Q2;P2;T2;f/2;None!;P2]*;Q2;P2;T2; 



\ U 2 ; None! ;V 2 ; None! \W 2 \X 2 ] (h) ) 



Fig. 8. Invariants Invl and Inv2 of Figure 9 



In Lamport's algorithm each process ProQ operates on shared variables 
token.i and takingi of type integer and boolean respectively. When a process 
ProCj intends to enter the critical section, it first reads the value of token cor- 
responding to another process, say v, and assigns the value v + 1 to its own 
token variable. jtok\ and jtok 2 are local variables of Proci which hold the value 
of tokeni and tokens respectively. Similarly, stok\ and stok^ are local variables 
of Proc2 for tokeni and token 2 . (a,b) < (a',b') denotes lexicographic less than 
relation, i.e. (ftok2,2) < (ftoki,l) iff ftok 2 < ftoki and (stoki,l) < (stok2,2) 
iff stoki < stok2- This algorithm with inline assertions is shown in Figure 9 
where Invl and Invl are as in Figure 8. Assertions are on the history h 2 ; and 
local variables of that process. History h^ is abstracted using the predicates of 
Figure 5. It should be noted that V\,W\ and X\ do not appear explicitly inside 
the regular structure of the history abstracted by \Q\\R\\T\, XJ\\ None!; Pi]*. 
They appear in the last iteration of the loop and therefore the variables updated 
inside the loop, (ftoki,ftok2), are assigned the placeholders corresponding to 
these reads, i.e. ph 2 , ph 3 and ph 4 . Same holds for the invariant of Proc2 as well. 
Subsequently, these elements are abstracted to None! in order to establish the 
loop invariant. 

Mutual Exclusion proof In order to prove the mutual exclusion property, we 
first state the required assertion to capture this property. 

ME d = (Invl"(hi)Alnv2"(h 2 ) A Compat(0,0, false, false, hi,h 2 ) = false) 



10 



tokeni:=0, token2:=0, takingi:=false,taking2:=false 
Proci Proc2 



while(true) do 

[Qi;Ri;T i; Ui;None!;Pi]*(hi) 

1. takingi:=true 

[[Qi;Ri; Ti; Uij None!; Pi]*; Qi](hi) 

2. ftok2:=token2 

3. tokeni:=ftok2 + 1 

4. takingi :=false 
[[Qi;Ri;Ti;Ui;None!;Pi]*;Qi;Ri;T i; Ui](hi] 

5. fdone2:=taking2 

6. while(fdone2 7^ false) do 

fdone2:=taking2 
od 

7. ftoki:=tokeni 

8. ftok2:=token2 

Invl'(hi) d = Invl(hi) A ftoki = ph 3 
Aftok2 = plu A ph2 = false 

9. while(ftok 2 + A (ftok 2 , 2) < (ftoki, 1)) do 

ftoki:=tokeni 
ftok2:=token2 
od 

Invl"(hi) d = Invl'(hi) 

A^(ftok 2 ! = 0A(ftok 2 ,2) < (ftoki,!)) 



Critical Section 



10. tokem:=0 
[Qi;Ri;Ti;Ui;None!;Pi] + (hi) 



od 



while(true) do 

[Q2; R2; T 2 ; U 2 ; None!; PaHha) 

1'. taking 2 :=true 

[[Q2; R2; T 2 ; U 2 ; None!; P 2 ]*; Q 2 ](h 2 ) 

2'. stoki:=tokem 

3'. token 2 :=stoki + 1 

4'. taking2:=false 

[[Q2; R2; T 2 ; U 2 ; None!; P 2 ]*; Q 2 ; R 2 ; T 2 ; U 2 ](h 2 ) 

5'. sdonci:=takingi 

6'. while(sdonei 7^ false) do 

sdonei:=takingi 
od 
7'. stoki:=tokeni 
8'. stok2:=token2 
Inv2'(h 2 ) d = Inv2(h 2 ) A stoki = ph^ 
Astok2 = ph 4 A ph 2 = false 
9'. while(stoki 7^ A (stoki, 1) < (stok 2 , 2)) do 

stoki:=tokeni 

stok2i=token2 
od 

Inv2"(h 2 ) d = f Inv2'(h 2 ) 

A-.(stoki! = A (stoki, 1) < (stok 2 , 2)) 



Critical Section 
10'. token 2 :=0 

[Q 2 ;R2;T2;U2;None!;P2] + (h 2 ) 



od 



Fig. 9. Lamport's Bakery Algorithm for Two Processes 



where Invl"(hi) and Inv2'' '(h 2 ) are the assertions inside critical sections of 
Proci and Proc2 respectively. We prove it in two steps. First we show that, 



Invl'(hi) A Inv2'(h 2 ) A Compat(0, 0, false, false, hi, h 2 ) => Inter. 

(inter! d = (ftok 2 = =* stoki ^ A stoki < stok 2 ) \ 

ftok 2 ^0A/tofc 2 < ftoki) 



Inter 



!>i 



def 



A Inter2 = (stoki = 



def 



A InterS = J (ftok 2 ^0A stoki + =>- 

ftoki < ftok 2 =>■ stoki < stok 2 



and then it is easy to show that, 



Inter A ~^(ftok 2 /0A (ftok 2 ,2) < (ftoki,!)) 

A ->(siofci 7^ A (stoki, 1) < (stok 2 ,2)) = false 
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Proof of Invl' (hi) A Inv2'(h 2 ) A Compat(0, 0, false, false, hi, h 2 ) =>■ Interl , We 
show that in two steps, 



and 



Invl' (hi) A Inv2'(h 2 ) A Compat(0, 0, false, false, hi,h 2 ) 
A ftok 2 = => stoki / 

Invl'(hi) A Inv2'(h 2 ) A Compat(0, 0, false, false, /ii, /12) 
A ftok 2 = =>• stofci < Sfofc2 



(1) 



(2) 



Proof of (1) , First assume that Invl'(hi), Inv2'(h 2 ), Compat(0,0, false, false, hi, h 2 ) 
and ftok 2 = hold. Only way to have ftok 2 = ph 4 = is to put last read of 
token 2 in hi (denoted by Xi) in the merged history after P 2 of, say k — 1 th 
iteration of [Q 2 ; R 2 ; T 2 ; U 2 ; None!; P 2 ] and before T 2 of k th iteration. Invl 
implies that Ti -< Xi hence the value of token\ visible at X\ is non-zero 
which also becomes visible at T 2 because of the placement of Xi before T 2 
and the fact that the history hp roCl does not contain any write to tokeni after 
X\. Further because Inv2 implies T 2 -j W 2 therefore the read of tokeni at 
W 2 in Proc2 also observes this non-zero value of tokeni and assigns it to 
stoki. Therefore we have, 

Invl(hi) A Inv2(h 2 ) A Compat(0, 0, false, false, hi, h 2 ) 
A ftok 2 = =^ stoki ^ 

Out of all the orderings of Proci and Proc2 only T\ -< X\ and T 2 -< W 2 are 
used to prove this part of the proof. Hence these two are sufficient to prove 

(I)- 
Proof of (2) In order to have ftok 2 = phi = 0, Xi should be placed in the 
merged history after P 2 , of say k—l th iteration of [Q 2 ; R 2 ; T 2 ; U 2 ; None!; P 2 ] 
and before T 2 in k th iteration. Further, Inv[ implies ph 2 = false and therefore 
^(corresponding to the read of taking 2 in last iteration) must be placed 
before Q 2 of any n th iteration and after U 2 of n — 1 th iteration such that 
n < k th . It must be noticed that Vi cannot be put after U 2 of k th iteration 
because Vi -< Xi in the history of Proci. Invl implies T\ -j Vi and hence the 
value of tokem visible at Vi is non-zero, say v. As Vi has been placed before 
Q 2 of n th iteration, where n < k, hence the same value v of tokeni is also 
visible at Q 2 of n th iteration. Inv2 implies that Q 2 -< R 2 hence the same 
value v of tokeni is also visible at R 2 of any subsequent iterations. Inv2 
implies R 2 -< T 2 and therefore all subsequent iterations of T 2 write v + 1 to 
token 2 resulting in stok 2 — v + 1. Further the value visible at stoki in Proc2 
is still the last value written to tokeni by Proci, i.e. v. Therefore we get, 

Invl'(hi) A Inv2'(h 2 ) A Compat(0, 0, false, false, hi,h 2 ) 
A ftok 2 = => stoki < stok 2 

Only orderings used in this part of the proof arc Vi -< X 1 , Ti -< Vi , R 2 -< T 2 
and Q 2 -< R 2 . Hence, out of all total orders in hi and h 2 these are sufficient 
to prove (2). 
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Proof of Invl' A Inv2' A Compat(0, 0, false, false, hi, h 2 ) =>■ Inter2, This is sym- 
metric to previous proof and gives us following symmetric sufficient order- 
mgs; T 2 < X 2 , T x -< Wi, V 2 -< X 2 , T 2 -< F 2 , Qi -< R x and i? x -< T x . 

Proof of Invl' A Inv2' A Compat(0,0, false, false, hi, h 2 ) => Inter3 , For ftok 2 ^ 
0, Xi must read the non-zero value written to token 2 at T 2 and similarly for 
stoki ^ 0, W 2 must read the non-zero value written to tokeni at T\. Let -X"i 
read from T 2 of iteration fc 2 and W 2 reads from T\ of iteration fci . Following 
possibilities arise based on whether or not fci and fc 2 are last iterations of 
Proci and Proc 2 . 

ki is not the last iteration and k 2 is the last iteration In order to have 
stoki ^ 0, W 2 is placed after T\ of k\ h iteration and before Pi of k{ h 
iteration in the merged history. Inv2 implies that T 2 -< W 2 and therefore 
the value written to token 2 in the last iteration of Proc 2 at T 2 is some v 
such that v 7^ and it flows to W 2 . Invl implies Pi -< Xi and because 
Proci does not write to token 2 hence Xi also sees the value of token 2 as 
v 7^ 0. Invl further implies that Pi -< Ri for Ri of any iteration greater 
than fci and Ri -< T± such that i?i reads the value of token 2 and writes 
back this value incremented by 1 to tokeni at Ti. Therefore t> + 1 is writ- 
ten to tokeni at Ti in all iterations greater than fci. Further, Ti -< Wi 
implies that the same value u + 1 is also visible at VFi . Therefore we get 
ftoki > ftok 2 implying InterS trivially holds. 

fci is the last iteration and fc 2 is not the last iteration In order to have 
/fofc 2 ^ 0, Xi is placed after T 2 of fc 2 and before P 2 of fc 2 in the merged 
history. Invl implies Ti -< X\ and therefore the value written to tokeni 
in the last iteration of Proci at Ti is some v such that v ^ and it 
flows to Xi . Inv2 implies P 2 -< R 2 for R 2 of any iteration greater than 
fc 2 and R 2 -< T 2 such that R 2 reads the value of tokeni and writes back 
this value incremented by 1 to token 2 at T 2 . Therefore v+ 1 is written to 
token 2 at T 2 in all iteration greater than fc 2 . Also Inv2 implies T 2 -< X 2 
which gives stok 2 equal to v + 1. Further Inv2 also implies P 2 -< W 2 
hence the same value v of tokeni which is visible at P 2 is assigned to 
stoki. This results in sfofci < stok 2 and hence proved. 

Both fci and fc 2 are last iterations : In this case Xi is placed after T 2 of 
the last iteration and because T 2 -< X 2 hence ftok 2 = stok 2 . Similarly, if 
W 2 is placed after T\ of the last iteration then from Ti -< Wi in Invl we 
have siofci = stok 2 . Therefore ftoki < ftok 2 => stofci < stok 2 hence 
proved. 

Neither of fci and fc 2 are last iterations : We show that if ftok 2 reads 
the value of token 2 from k 2 h iteration of Proc 2 then it is not possible for 
sfofci to read the value of tokeni from the iteration fci of Proci which 
is not the last iteration. Let Xi is placed after T 2 and before P 2 of k 2 h 
iteration. Ti -< Xi implies that the value of tokeni visible at Xi is from 
the last iteration of T\ which also becomes visible at P 2 of fc| iteration 
because of the placement of X\. Further, P 2 -i, W 2 implies that W 2 sees 
the same value of tokeni written by the last iteration of Proci at Ti which 
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is not fci . Similar argument follows for W 2 as well. Hence no compatible 
merged history exists for this case. 
Finally we collect the sufficient orderings used to prove InterS. 

- 2i -< Xx, Ti -< Wx, Pi -< X 1: Px -< Rx and Rx -< T x for Proci. 

- T 2 -< X 2 , P 2 -< W 2l P 2 -< W 2 , P 2 -< R 2 and P 2 ~< T 2 for Proc 2 . 

In Pi -< i?i and P 2 -< P 2 , Pi comes from the iteration later than that of P,. 

Lamport's bakery algorithm under PSO memory model PSO mem- 
ory model allows write instructions in a process to be reordered with future 
write and read instructions operating on different addresses. We have follow- 
ing sufficient instruction orderings needed to prove Inter and hence mutual 
exclusion. 

- Tx ~< Vx, T x < Wx, Tx -< Xx, Pi -< X u Vx ~< X 1} Qx -< Rx, Pi ■< Pi 

- T 2 ■< V 2 , T 2 ~< X 2 , T 2 ~< W 2 , P 2 ~< W 2 , V 2 -< X 2l Q 2 ~< R 2 , R 2 -< T 2 

- Px ~< Rx if Pi is from iteration k then Pi is from iteration greater than 
k. 

- P 2 -< P 2 if P 2 is from iteration k then P 2 is from iteration greater than 
k. 

The PSO memory model preserves the ordering of any two instructions which 
are data or control dependent. Therefore T\ -< Wx, T 2 -< X 2 , Pi -< Pi and 
P 2 -< P 2 are also satisfied by PSO. Further, PSO also preserves the order of 
two read instructions, i.e. Vx ~< X\ and V 2 -< X 2 . A fence between Pi and 
Vx also satisfies the ordering Pi -< X\ and symmetrically a fence between 
T 2 and V 2 also satisfies the ordering P 2 -i, W 2 . Further, a fence between Qx 
and Pi also orders Pi of k th iteration before Pi of > k + 1 th iterations and 
Xx ■ A fence between Q 2 and P 2 also orders P 2 of k th iteration before R 2 of 
> k + 1 th iteration and W 2 . Therefore we only need two fence instructions 
per process, i.e. between Qx and Pi and between Pi and V\ in Proci and 
symmetrically in Proc 2 . 

6.2 Example: Simpson's 4 slot algorithm [Sim90] 

This is a wait-free algorithm for concurrent access of a location from a single 
reader and a single writer processes. This location is simulated using a 2x2 array 
variable slot. If the reader is reading the data and the writer wants to write new 
data at the same time then instead of waiting for the reader to complete, the 
writer writes to a different index of slot and indicates the reader to read from this 
location in the subsequent read. Boolean variables reading and latest represent 
two indices (false as and true as 1) to denote the row and column indices of slot 
and index variable. Variable index is a two element boolean array such that for 
any s G {true, false}, slot [s] [index [s]] has the latest data written by the writer 
in the row slotfs]. Variables fwp and fwindex are local to the writer process. 
Variables srp and srindex are local to the reader process. The algorithm with 
inline assertions on history and local variables are shown in Figure 10. In order to 
simulate different invocations of the writer and the reader the respective program 
is enclosed within the loop. We are interested in proving following two properties 
of this algorithm. 
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reading: =false, latest —false, slot[2][2]:={0}, index[2] —{false} 



Invl d = Ah. 



/Let phi,ph,2,Pi,Qi,Ri,Si,Ti in 
Pi = Xh. (Treading, phi )(h) in 
Q\ — Xh. (?index[^phi],ph 2 )(/i) in 
Ri = Aft,. (!slot[- 1 phi][-.ph 2 ],_)(ft) in 
Si — Xh. (!index[-iphi], ^ph 2 )(/i) in 
2i = Xh. (!latest,-iphi)(/i) in 

\ [Pi;Qi;7?i;Si;Ti]*(/i) A None! reading (/i)/ 



Writer 
while(true) do 
Invl (hi) 

fwp— ^reading 
fwindex := -ihidex [fwp] 

Invl' d = [[Pi;Qi;Ri;Si;Ti]*; 

(Treading, phi); (?index[-.phi],ph 2 )](hi) 
Afwp = -iphi A fwindex — ^ph 2 
ANone!rcading(hi) 



Critical Section 



Write to slot[fwp] [fwindex] 
index [fwp] :=fwindex 
latest :=fwp 
[Pi;Qi;Ri;Si;Ti]+(hi) 



od 



Inv2 d = Ah. 

/"Let ph'i,pti 2 ,P2,Q2,R2,S2. in 
P 2 = Xh. (? latest, ph[)(h) in 
Q2 = Xh. (Ireading , ph[)(h) in 
R2 = Xh. (?mdex[p/ii],p/!, 2 )(/i) in 
S 2 = Xh. (lslot\ph'i}\pti 2 ],-)(h) in 

[P 2 ;Q 2 ;-R 2 ;5' 2 ]*(/i) A None!mfa(/i) 
\ A None\latest(h) A None!slot(/i) / 



Reader 
while(true) do 
Inv2(h 2 ) 
srp:=latest 
reading: =srp 
srindex — index [srp] 

Inv2' d = [[P 2 ;Q 2 ;R 2 ;S 2 ]*; 

(?latest, phi); (Ireading, phi); (?index[phi],ph 2 )](h 2 ) 
Asrp = phi A srindex = ph' 2 
ANone!index(h 2 ) A None!latest(h 2 ) 
ANone!slot(h 2 ) 



Critical Section 



Read from slot[srp][srindex] 
[P 2 ;Q 2 ;R 2 ;S 2 ] + (h 2 ) 



od 



Fig. 10. Simpson's 4 slot algorithm 



Interference Freedom We want to show that at the entry point of critical 
sections (for the writer and the reader) fwp 7^ srp, i.e. the reader and the 
writer use different rows of the slot variable, and fwp — srp =$■ fwindex 7^ 
srindex, i.e. if both use the same row of the slot variable then they read 
from and write to different column of that row. Therefore, we want to prove 

Invl' (hi) A Inv2'(h 2 ) A Compat(false, false, {0}, {false}, hi, h 2 ) ==>■ 
fwp 7^ srp V fwp = srp => fwindex 7^ srindex 

Where Invl' and Inv2' are the assertions just before the program point 
where writer is going to write the data and reader is going to read the data. 

Proof. In any compatible merged history h of hi and h 2 the placement of 
the last write (Ireading, phi) of hi has two choices with respect to the last 
read (Treading, phi) of hi. 
— Last write (Ireading, ph^) ofh 2 is placed before the last read (?reading, phi) 
of h\: As writer does not write to reading hence phi = ph[, or ->fwp = 
srp (from assertions fwp = —>ph\ and srp = ph'i) and therefore fwp 7^ 
srp. Hence proved. 
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— Last write (heading, ph'jj of hi is placed after the last read (?reading, phi) 
of h\: This, along with the assumption fwp = srp implies ~>phi = ph[. 
Therefore in Inv2', (?mdex[pli'i], ph^) is same as (?index[-iphi], ph' 2 ). Ac- 
cording to InvV (?index[-iphi],ph2) is in h\. We want to establish the 
relation between phi and ph' 2 . 

InvV implies that S\ -< (Treading, phi) hence there is no write to vari- 
able index beyond (Treading, phi) in the merged history. Hence the value 
at index[-iphi] is same for both processes, i.e. phi = ph' 2 . From InvV 
and Invl' we get fwindex = —•ph.2 and srindex = phi which implies 
fwindex ^ srindex. Hence proved. 

From the above proof, we can see that the only ordering important in proving 
this property is Si -< Pi where Pi is from later iteration than that of Si . Now 
we prove another property of interest and find out the program ordcrings 
used in that proof. 
Consistent Reads Second property of interest is related to the order of reads. 
It specifies that the values read by the reader form a stuttering sequence of 
values written by the writer, i.e. if the writer writes the sequence 1,2,3,4,5 
in subsequent invocations of write then the reader cannot observe 1,4,3,5 as 
a sequence read. It must observe the sequence which preserve the order of 
writes and possibly interspersed with the repetition of the same data. First, 
we define some notations used in this section. Let McrgcdHist(Reader fl , Writer^) 
be the set of all compatible merged histories consisting of R invocations of 
the reader process and W invocations of the writer process. Let Reader(n) 
and Writer(n) be the n th invocations of the reader and the writer respec- 
tively. Let D k be the data written by the writer in the k invocation. Let 
D(w) be a sequence D^D 2 . • • • .D w of values written by w consecutive it- 
erations of the writer. For s e {true, false}, s denote the negation of s. Let 
elerriReaderT be the element elem € {(?_,_),(!_,_)} in the merged history 
from r th invocation of the Reader. Similarly elemwriter™ denote the same 
for for the w th iteration of the Writer. 

Stuttering sequence Let S rw be a sequence of length r, constructed from 
the elements of D{w). S r]W is a stuttering sequence of D(w) if for any index 
i of the sequence S r<w such the S r . w [i — 1] = D kl and S r , w [i] = D k2 then 
hi < k 2 . 

Some interesting properties of the Reader and the Writer processes 

Only the writer writes to latest and reads from reading variable. Further, 
only the reader writes to reading and reads from latest. Also, the value 
written to reading by the reader in any invocation is same as the value read 
from latest. The value written to latest by the writer in any invocation is 
negation of the value that it reads from reading. 

Following lemma characterizes the sequence of values written to reading in 
a segment of the merged history. This characterization is then used in the 
proof of Lemma 2. 
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Lemma 1. For all R,W,h& MergedHist(Reader K , Writer M/ ) ; r < R,w < 
W, s S {true, false}, i/Rcader(r) reads the value of latest written by Writer (w) 
as s then in the sequence of values written to variable reading between Pi of 
Writer(w) and Pi o/Reader(r), s is never followed bys. 



A" : (!Iatest,5) Writ 



B": (latest, s) Readerr , 

\ 
B' : (!reading,s) fleaderr / 



A' : (?reading, s) Writer n+i 

I 
A : (!latest,s) H / rWer n+i 



B : (?latest,s) deader'' 

Fig. 11. Merged history for Lemma f 



Proof. We prove it using induction on the iteration number of the writer 
process. 

Base case, w = 0: ff Reader(r) reads the initial value of latest (in th itera- 
tion of the writer process), say s, then all the invocations of the reader before 
Rcader(r) also see this initial value of latest as s and therefore the sequence 
is made of only this value. Hence the base case satisfies this property. 
Induction Hypothesis, w < n: For all w < n, if Rcadcr(r) reads the value 
of latest as s, written by Writer(w) then the sequence of values written to 
reading between P\ of Writer(w) and Pi of Rcader(r) satisfies this property. 
Induction Step, w = n + 1: Consider the merged history of Figure 11 
where Reader(r) reads the value of latest (denoted by B) from Writer(?i + 1) 
(denoted by A) . We want to prove this property for the sequence of values 
written to reading between A' and B where A 1 denotes P\ of Writer (n + 1) in 
Figure 11. P\ -<T\ implies that A 1 appears before A in the merged history. 
Let Writer(n + 1) read the value of reading from Reader(r') (denoted by 
B'). Pi -j Qi implies that Pi of Reader(r') appears before B' in the merged 
history. Let Reader(r') at Pi read the value of latest from Writer(n')(A"). 
It is clear that n' < n+1 hence from the induction hypothesis we know 
that in the sequence of values written to reading between A" and B", s is 
never followed by s. We use this knowledge to characterize the values writ- 
ten to latest between B" and A'. From the writer process we know that the 
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value written to latest is negation of the value read from reading variable. 
Therefore in the sequence of values written to latest between B" and A' , s is 
never followed by s. Further, we also know that the reader process writes the 
same value to reading as is read from the variable latest. Therefore in the 
sequence of values written to reading between A' and B, s is never followed 
by s. Hence proved. 

For any reader we establish the relation between its read of variable latest 
and the read of data. More formally we want to say that, 

Lemma 2. For all R, W , he MergedHist(Reader fl , Writer 1 ^), r < R,w < 
W, s G {true, false}, if Rcadcr(r) reads the value of latest as s written by 
Writcr(w) and reads the value of index[s] written by Writer(w') then w' > w 
and Reader(r) reads the data D w and all the invocations of the writer from 
w to w' write the data in the row s of the slot variable. 

Proof. Let us assume that Rcadcr(r) reads the value of latest as s writ- 
ten by Writer(w;) which means that Pi of Readcr(r) appears after T\ of 
Writer (w) and no write to latest appears in between these two points in 
the merged history. As Writer (w) also writes to index[s] and Si -< T\ and 
Pi -< i?2, hence Rcader(r) reads the value of index[s] either from Writer(iz;) 
or from Writer(ui') such that w' > w. We prove the following two properties, 
— Reader(r) reads the data written by Writer (iz;') in slot [s] [A;] where 
k is the value written to index[s] by Writer(w') 
Proof: If Rcader(r) reads index[s] as k G {true, false} from Writer(w') 
then R\ -< S\ implies that Writer {w') has also written the data in slotfs] [k] 
We want to show that no subsequent invocation of the writer writes 
in slot[s][fc] before S2 of Reader(r). From the assumption that R2 of 
Reader(r) reads the value of index[s] from Writer(w') implies that there 
is no write to index[s] between Si of Writer(w') and Ri of Reader(r). 
S\ <P\ 1 where P\ is from later iterations than that of Si, in InvY im- 
plies that the writer can be invoked at most once after the iteration 
w' and before R2 of Reader(r). If invoked more than once it results in 
the write of index[s\ to appear after Si of Writer(w') and before Ri of 
Reader(r), contradicting our assumption. Further, if the next invocation 
after w' happens then it writes the data to column k of row s in slot vari- 
able still satisfying the property. If any further invocation of the writer 
happens after R2 and before S2 of Reader(r) then because of Q2 ~< Ri 
it observes the value of reading as s and therefore writes the data to 
s row of the slot variable. Inv2' implies that S2 of Rcader(r) -< Q2 of 
subsequent invocations of the reader and therefore no write to reading 
exists between i? 2 and S2 of Reader(r). This results in observing the 
same value of reading, s, and subsequently writing the data to row s by 
all invocations of the writer between R2 and S2 of Reader(r). 
All the invocations of the writer from w to w' write the data in 
the row s of the slot variable 
Proof: We prove an alternate equivalent property; All invocations of 



l<s 



the writer from w to w' read the value of reading as s. This follows 
from the property that the writer writes in that row of the slot variable 
that is obtained by negating the value of the variable reading. From 
the assumption, Writer(w') writes to index[s] and therefore it also reads 
the value of reading as s and the same holds for Writer (w) as well. We 
need to show that this property holds for the rest of the invocations in 
between w and w' . We prove it by contradiction by assuming that there 
exists a w" such that w < w" < w' and P\ of Writer (w") reads the 
value of reading as s. It is clear that Pi of Writer(w/') appears after Pi 
of Writcr(w) and before Pi of Writer(u/). From the earlier argument we 
know that the value of reading visible at Pi of Writer (iu) is s. Hence, 
Pi of Writer(u>") cannot read from the value of reading visible at Pi of 
Writer(w). Therefore, it must read the value of reading as s from the 
sequence of values written to reading between Pi of Writer(u>) and P2 
of Reader(r). Following Lemma 1, s can never be followed by s in the 
sequence of values written to reading in between Pi of Writer(iu) and P2 
of Rcader(r). It further implies that Pi of Writer(w') cannot read the 
value of reading as s, a contradiction to our assumption. Therefore all 
invocations of the writer from w to w 1 read the value of reading as s. 
Hence proved. 

Now we use Lemma 2 to prove the following theorem. 

Theorem 1 (Stuttering Sequence). Stutter (n) = For all n,w eN,li£ 

MergcdHist (Reader™, Writer 1 ") the sequence S niW; constructed from the ele- 
ments of D(w), is a stuttering sequence of D(w). 

Proof. Base case, n = 0: With no history corresponding to the Reader in 
the merged history, the empty sequence trivially forms a stuttering sequence. 
Induction Hypothesis: For all r < n Stutter(r) holds true. 
Induction Step, r = n + 1: 

Let Reader(n) read the value of latest from Writer(w a ). From Lemma 2, we 

know that Reader(n) reads the value D w such that w a < w' < w and all 

the invocations of the writer from w a to w' write the data in the same row s 

of the slot variable. Further, the value of index[s] visible at R2 of Reader(n) 

is from Si of Writer (w'). We know that the write to latest from consecutive 

iterations are totally ordered and the same holds for consecutive reads from 

latest as well. Therefore, Reader(n + 1) reads the value of latest from some 

Writer (wb) such that Wb > w a . There are two possibilities based on whether 

this value is s or s. 

— Reader(n + 1) reads the value of latest as s from the Writer(ui;,) such 

that Wb > w a : From our assumption we know that Reader(n) sees the 

value of index[s\ from Writcr(uZ) therefore the value of index[s] visible 

at i?2 of Reader (n + 1) will be from some w" such that w > w" > w' . 

Following Lemma 2, the data read by Rcader(n+1) from sZoi[s] [index [s\] 

will be from Writcr(u>") and w" > if' implies that the resulting sequence 

Sn+i,w is still a stuttering sequence. 
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— Reader(n + 1) reads the value of latest as s from Writer (uib) such that 
Wb > Wa'- From Lemma 2, we know that all the invocations of the writer 
from w a to w' write the data in the same row s of slot and because 
Writer(wb) writes the data in row s of slot therefore Wb > w' . Following 
Lemma 2, Reader(n + 1) reads the data from some Writer(u>") such 
that w" > Wb- Combining these two we get w" > w' such that Reader(n) 
reads D w and Reader(n+1) reads D w . Therefore the resulting sequence 
S n +i iS is still a stuttering sequence. 

Simpson's 4 s l°t algorithm under PSO memory model Following pcr- 
thrcad instruction orderings are used in the proofs of the interference freedom 
and the consistent reads properties. 

— Pi -< Ti(from the proof of Lemma 1), Si -< Ti, Ri -< Si 

— P% -< <52(from the proof of Lemma 1), P2 -< R2, Qi -< R2 

— Si -< Pi , where Pi is from iteration later than that of Si 

— S2 -< Q2, where Q2 is from iteration later than that of S2. 

Out of all these orderings, Pi -< Ti, P2 -< Q2 and P2 -< R2 are data dependent 
orders which are respected by the PSO memory model. S2 -< Q2 where Q2 is from 
iteration later than that of S2 is also respected by PSO memory model because 
S2 corresponds to read and Q2 corresponds to write instruction. Therefore, we 
have to enforce only Ri -< Si, Si -< Ti, Q2 -< R2 and Si -< Pi where Pi is from 
iteration later than that of Si . Following the semantics of fence it is sufficient 
to put two fence instructions in the writer; one between Ri and Si and another 
between Si and T\. Further, we need one fence instruction between Q2 and R2 
in the reader as well. 



7 Conclusion and Future Work 

In this paper we proved Simpson's 4 slot algorithm correct under the SC memory 
model with respect to the interference freedom and the consistent reads proper- 
ties. Based on these proofs, we identified the locations of the fence instructions 
needed to satisfy these two properties under the PSO memory model. As a direc- 
tion for future work we still have to explore the use of this approach for advanced 
memory models which support non-atomic writes (POWER/ ARM). This paper 
introduces the predicates over history variable only to carry out proofs conve- 
niently. No formal treatment is given to them and this should be addressed with 
high priority. Even in presence of predicates over history variables, the difficulty 
in carrying out these proofs without any tool support is evident from the proof 
of Simpson's 4 slot algorithm. We plan to use proof assistants for this in near 
future. In all the examples we considered here, a program is always executed 
by only one thread. This restriction needs to be addressed in order to handle 
concurrent data structures where many threads can execute the same method of 
an object. 
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